{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Named Entity Recognition with PyTorch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook we'll explore how we can use Deep Learning for sequence labelling tasks such as part-of-speech tagging or named entity recognition. We won't focus on getting state-of-the-art accuracy, but rather, implement a first neural network to get the main concepts across."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For our experiments we'll reuse the NER data we've already used for our CRF experiments. The Dutch CoNLL-2002 data has four kinds of named entities (people, locations, organizations and miscellaneous entities) and comes split into a training, development and test set. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[('De', 'Art', 'O'),\n",
       "  ('tekst', 'N', 'O'),\n",
       "  ('van', 'Prep', 'O'),\n",
       "  ('het', 'Art', 'O'),\n",
       "  ('arrest', 'N', 'O'),\n",
       "  ('is', 'V', 'O'),\n",
       "  ('nog', 'Adv', 'O'),\n",
       "  ('niet', 'Adv', 'O'),\n",
       "  ('schriftelijk', 'Adj', 'O'),\n",
       "  ('beschikbaar', 'Adj', 'O'),\n",
       "  ('maar', 'Conj', 'O'),\n",
       "  ('het', 'Art', 'O'),\n",
       "  ('bericht', 'N', 'O'),\n",
       "  ('werd', 'V', 'O'),\n",
       "  ('alvast', 'Adv', 'O'),\n",
       "  ('bekendgemaakt', 'V', 'O'),\n",
       "  ('door', 'Prep', 'O'),\n",
       "  ('een', 'Art', 'O'),\n",
       "  ('communicatiebureau', 'N', 'O'),\n",
       "  ('dat', 'Conj', 'O'),\n",
       "  ('Floralux', 'N', 'B-ORG'),\n",
       "  ('inhuurde', 'V', 'O'),\n",
       "  ('.', 'Punc', 'O')],\n",
       " [('In', 'Prep', 'O'),\n",
       "  (\"'81\", 'Num', 'O'),\n",
       "  ('regulariseert', 'V', 'O'),\n",
       "  ('de', 'Art', 'O'),\n",
       "  ('toenmalige', 'Adj', 'O'),\n",
       "  ('Vlaamse', 'Adj', 'B-MISC'),\n",
       "  ('regering', 'N', 'O'),\n",
       "  ('de', 'Art', 'O'),\n",
       "  ('toestand', 'N', 'O'),\n",
       "  ('met', 'Prep', 'O'),\n",
       "  ('een', 'Art', 'O'),\n",
       "  ('BPA', 'N', 'B-MISC'),\n",
       "  ('dat', 'Pron', 'O'),\n",
       "  ('het', 'Art', 'O'),\n",
       "  ('bedrijf', 'N', 'O'),\n",
       "  ('op', 'Prep', 'O'),\n",
       "  ('eigen', 'Pron', 'O'),\n",
       "  ('kosten', 'N', 'O'),\n",
       "  ('heeft', 'V', 'O'),\n",
       "  ('laten', 'V', 'O'),\n",
       "  ('opstellen', 'V', 'O'),\n",
       "  ('.', 'Punc', 'O')],\n",
       " [('publicatie', 'N', 'O')]]"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import nltk\n",
    "\n",
    "train_sents = list(nltk.corpus.conll2002.iob_sents('ned.train'))\n",
    "dev_sents = list(nltk.corpus.conll2002.iob_sents('ned.testa'))\n",
    "test_sents = list(nltk.corpus.conll2002.iob_sents('ned.testb'))\n",
    "\n",
    "train_sents[:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we're going to preprocess the data. For this we use the `torchtext` Python library, which has a number of handy  utilities for preprocessing natural language. We process our data to a Dataset that consists of Examples. Each of these examples has two fields: a text field and a label field. Both contain sequential information (the sequence of tokens, and the sequence of labels). We don't have to tokenize this information anymore, as the CONLL data has already been tokenized for us."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'labels': <torchtext.data.field.Field object at 0x7fc89bc02e10>, 'text': <torchtext.data.field.Field object at 0x7fc89bc02eb8>}\n",
      "['De', 'tekst', 'van', 'het', 'arrest', 'is', 'nog', 'niet', 'schriftelijk', 'beschikbaar', 'maar', 'het', 'bericht', 'werd', 'alvast', 'bekendgemaakt', 'door', 'een', 'communicatiebureau', 'dat', 'Floralux', 'inhuurde', '.']\n",
      "['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'O', 'O']\n",
      "Train: 15806\n",
      "Dev: 2895\n",
      "Test: 5195\n"
     ]
    }
   ],
   "source": [
    "from torchtext.data import Example\n",
    "from torchtext.data import Field, Dataset\n",
    "\n",
    "text_field = Field(sequential=True, tokenize=lambda x:x, include_lengths=True) # Default behaviour is to tokenize by splitting\n",
    "label_field = Field(sequential=True, tokenize=lambda x:x, is_target=True)\n",
    "\n",
    "def read_data(sentences):\n",
    "    examples = []\n",
    "    fields = {'sentence_labels': ('labels', label_field),\n",
    "              'sentence_tokens': ('text', text_field)}\n",
    "    \n",
    "    for sentence in sentences: \n",
    "        tokens = [t[0] for t in sentence]\n",
    "        labels = [t[2] for t in sentence]\n",
    "        \n",
    "        e = Example.fromdict({\"sentence_labels\": labels, \"sentence_tokens\": tokens},\n",
    "                             fields=fields)\n",
    "        examples.append(e)\n",
    "    \n",
    "    return Dataset(examples, fields=[('labels', label_field), ('text', text_field)])\n",
    "\n",
    "train_data = read_data(train_sents)\n",
    "dev_data = read_data(dev_sents)\n",
    "test_data = read_data(test_sents)\n",
    "\n",
    "print(train_data.fields)\n",
    "print(train_data[0].text)\n",
    "print(train_data[0].labels)\n",
    "\n",
    "print(\"Train:\", len(train_data))\n",
    "print(\"Dev:\", len(dev_data))\n",
    "print(\"Test:\", len(test_data))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we build a vocabulary for both fields. This vocabulary allows us to map every word and label to their index. One index is kept for unknown words, another one for padding."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "VOCAB_SIZE = 20000\n",
    "\n",
    "text_field.build_vocab(train_data, max_size=VOCAB_SIZE)\n",
    "label_field.build_vocab(train_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we're on a machine with a CUDA-enabled GPU, we'd like to use this GPU for training and testing. If not, we'll just use the CPU. The check below allows us to write code that works on both CPU and GPU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "cuda\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "\n",
    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
    "print(device)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another convenient class in `Torchtext` is the BucketIterator. This iterator creates batches of similar-length examples in the data. It also takes care of mapping the words and labels to the correct indices in their vocabularies, and pads the sentences so that they all have the same length. The Bucketiterator creates batches of similar-length examples to minimize the amount of padding. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torchtext.data import BucketIterator\n",
    "\n",
    "BATCH_SIZE = 32\n",
    "train_iter = BucketIterator(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True, \n",
    "                            sort_key=lambda x: len(x.text), sort_within_batch=True)\n",
    "dev_iter = BucketIterator(dataset=dev_data, batch_size=BATCH_SIZE, \n",
    "                          sort_key=lambda x: len(x.text), sort_within_batch=True)\n",
    "test_iter = BucketIterator(dataset=test_data, batch_size=BATCH_SIZE, \n",
    "                           sort_key=lambda x: len(x.text), sort_within_batch=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pre-trained embeddings\n",
    "\n",
    "Pre-trained embeddings embeddings are generally an easy way of improving the performance of your model, particularly if you have little training data. Thanks to these embeddings, you'll be able to make use of knowledge about the meaning and use of the words in your dataset that was learned from another, typically larger data set. In this way, your model will be able to generalize better between semantically related words. \n",
    "\n",
    "In this example, we make use of the popular FastText embeddings. These are high-quality pre-trained word embeddings that are available for a wide variety of languages. After downloading the `vec` file with the embeddings, we use them to initialize our embedding matrix. We do this by creating a matrix filled with zeros whose number of rows equals the number of words in our vocabulary and whose number of columns equals the number of dimensions in the FastText vectors (300). We have to take care that we insert the FastText embedding for a particular word in the correct row. This is the row whose index corresponds to the index of the word in the vocabulary. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading pre-trained embeddings\n",
      "Initializing embedding matrix\n"
     ]
    }
   ],
   "source": [
    "import random\n",
    "import os\n",
    "import numpy as np\n",
    "\n",
    "EMBEDDING_PATH = os.path.join(os.path.expanduser(\"~\"), \"data/embeddings/fasttext/cc.nl.300.vec\")\n",
    "\n",
    "\n",
    "def load_embeddings(path):\n",
    "    \"\"\" Load the FastText embeddings from the embedding file. \"\"\"\n",
    "    print(\"Loading pre-trained embeddings\")\n",
    "    \n",
    "    embeddings = {}\n",
    "    with open(path) as i:\n",
    "        for line in i:\n",
    "            if len(line) > 2: \n",
    "                line = line.strip().split()\n",
    "                word = line[0]\n",
    "                embedding = np.array(line[1:])\n",
    "                embeddings[word] = embedding\n",
    "    \n",
    "    return embeddings\n",
    "    \n",
    "\n",
    "def initialize_embeddings(embeddings, vocabulary):\n",
    "    \"\"\" Use the pre-trained embeddings to initialize an embedding matrix. \"\"\"\n",
    "    print(\"Initializing embedding matrix\")\n",
    "    embedding_size = len(embeddings[\".\"])\n",
    "    embedding_matrix = np.zeros((len(vocabulary), embedding_size), dtype=np.float32)\n",
    "                                \n",
    "    for idx, word in enumerate(vocabulary.itos): \n",
    "        if word in embeddings:\n",
    "            embedding_matrix[idx,:] = embeddings[word]\n",
    "            \n",
    "    return embedding_matrix\n",
    "\n",
    "embeddings = load_embeddings(EMBEDDING_PATH)\n",
    "embedding_matrix = initialize_embeddings(embeddings, text_field.vocab)\n",
    "embedding_matrix = torch.from_numpy(embedding_matrix).to(device)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model\n",
    "\n",
    "Next, we create our BiLSTM model. It consists of four layers:\n",
    "    \n",
    "- An embedding layer that maps one-hot word vectors to dense word embeddings. These embeddings are either pretrained or trained from scratch.\n",
    "- A bidirectional LSTM layer that reads the text both front to back and back to front. For each word, this LSTM produces two output vectors of dimensionality `hidden_dim`, which are concatenated to a vector of `2*hidden_dim`.\n",
    "- A dropout layer that helps us avoid overfitting by dropping a certain percentage of the items in the LSTM output.\n",
    "- A dense layer that projects the LSTM output to an output vector with a dimensionality equal to the number of labels.\n",
    "\n",
    "We initialize these layers in the `__init__` method, and put them together in the `forward` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch.nn as nn\n",
    "from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence\n",
    "\n",
    "class BiLSTMTagger(nn.Module):\n",
    "\n",
    "    def __init__(self, embedding_dim, hidden_dim, vocab_size, output_size, embeddings=None):\n",
    "        super(BiLSTMTagger, self).__init__()\n",
    "        \n",
    "        # 1. Embedding Layer\n",
    "        if embeddings is None:\n",
    "            self.embeddings = nn.Embedding(vocab_size, embedding_dim)\n",
    "        else:\n",
    "            self.embeddings = nn.Embedding.from_pretrained(embeddings)\n",
    "        \n",
    "        # 2. LSTM Layer\n",
    "        self.lstm = nn.LSTM(embedding_dim, hidden_dim, bidirectional=True, num_layers=1)\n",
    "        \n",
    "        # 3. Optional dropout layer\n",
    "        self.dropout_layer = nn.Dropout(p=0.5)\n",
    "\n",
    "        # 4. Dense Layer\n",
    "        self.hidden2tag = nn.Linear(2*hidden_dim, output_size)\n",
    "        \n",
    "    def forward(self, batch_text, batch_lengths):\n",
    "\n",
    "        embeddings = self.embeddings(batch_text)\n",
    "        \n",
    "        packed_seqs = pack_padded_sequence(embeddings, batch_lengths)\n",
    "        lstm_output, _ = self.lstm(packed_seqs)\n",
    "        lstm_output, _ = pad_packed_sequence(lstm_output)\n",
    "        lstm_output = self.dropout_layer(lstm_output)\n",
    "        \n",
    "        logits = self.hidden2tag(lstm_output)\n",
    "        return logits"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training\n",
    "\n",
    "Then we need to train this model. This involves taking a number of decisions: \n",
    "\n",
    "- We pick a loss function (or `criterion`) to quantify how far away the model predictions are from the correct output. For multiclass tasks such as Named Entity Recognition, a standard loss function is the Cross-Entropy Loss, which here measures the difference between two multinomial probability distributions. PyTorch's `CrossEntropyLoss` does this by first applying a `softmax` to the last layer of the model to transform the output scores to probabilities, and then computing the cross-entropy between the predicted and correct probability distributions. The `ignore_index` parameter allows us to mask the padding items in the training data, so that these do not contribute to the loss. We also remove these masked items from the output afterwards, so they are not taken into account when we evaluate the model output.\n",
    "- Next, we need to choose an optimizer. For many NLP problems, the Adam optimizer is a good first choice. Adam is a variation of Stochastic Gradient Descent with several advantages: it maintains per-parameter learning rates and adapts these learning rates based on how quickly the values of a specific parameter are changing (or, how large its average gradient is).\n",
    "\n",
    "Then the actual training starts. This happens in several epochs. During each epoch, we show all of the training data to the network, in the batches produced by the BucketIterators we created above. Before we show the model a new batch, we set the gradients of the model to zero to avoid accumulating gradients across batches. Then we let the model make its predictions for the batch. We do this by taking the output, and finding out what label received the highest score, using the `torch.max` method. We then compute the loss with respect to the correct labels. `loss.backward()` then computes the gradients for all model parameters; `optimizer.step()` performs an optimization step.\n",
    "\n",
    "When we have shown all the training data in an epoch, we perform the precision, recall and F-score on the training data and development data. Note that we compute the loss for the development data, but we do not optimize the model with it. Whenever the F-score on the development data is better than before, we save the model. If the F-score is lower than the minimum F-score we've seen in the past few epochs (we call this number the patience), we stop training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch.optim as optim\n",
    "from tqdm import tqdm_notebook as tqdm\n",
    "from sklearn.metrics import precision_recall_fscore_support, classification_report\n",
    "\n",
    "\n",
    "def remove_predictions_for_masked_items(predicted_labels, correct_labels): \n",
    "\n",
    "    predicted_labels_without_mask = []\n",
    "    correct_labels_without_mask = []\n",
    "        \n",
    "    for p, c in zip(predicted_labels, correct_labels):\n",
    "        if c > 1:\n",
    "            predicted_labels_without_mask.append(p)\n",
    "            correct_labels_without_mask.append(c)\n",
    "            \n",
    "    return predicted_labels_without_mask, correct_labels_without_mask\n",
    "\n",
    "\n",
    "def train(model, train_iter, dev_iter, batch_size, max_epochs, num_batches, patience, output_path):\n",
    "    criterion = nn.CrossEntropyLoss(ignore_index=1)  # we mask the <pad> labels\n",
    "    optimizer = optim.Adam(model.parameters())\n",
    "\n",
    "    train_f_score_history = []\n",
    "    dev_f_score_history = []\n",
    "    no_improvement = 0\n",
    "    for epoch in range(max_epochs):\n",
    "\n",
    "        total_loss = 0\n",
    "        predictions, correct = [], []\n",
    "        for batch in tqdm(train_iter, total=num_batches, desc=f\"Epoch {epoch}\"):\n",
    "            optimizer.zero_grad()\n",
    "            \n",
    "            text_length, cur_batch_size = batch.text[0].shape\n",
    "            \n",
    "            pred = model(batch.text[0].to(device), batch.text[1].to(device)).view(cur_batch_size*text_length, NUM_CLASSES)\n",
    "            gold = batch.labels.to(device).view(cur_batch_size*text_length)\n",
    "            \n",
    "            loss = criterion(pred, gold)\n",
    "            \n",
    "            total_loss += loss.item()\n",
    "\n",
    "            loss.backward()\n",
    "            optimizer.step()\n",
    "\n",
    "            _, pred_indices = torch.max(pred, 1)\n",
    "            \n",
    "            predicted_labels = list(pred_indices.cpu().numpy())\n",
    "            correct_labels = list(batch.labels.view(cur_batch_size*text_length).numpy())\n",
    "            \n",
    "            predicted_labels, correct_labels = remove_predictions_for_masked_items(predicted_labels, \n",
    "                                                                                   correct_labels)\n",
    "            \n",
    "            predictions += predicted_labels\n",
    "            correct += correct_labels\n",
    "\n",
    "        train_scores = precision_recall_fscore_support(correct, predictions, average=\"micro\")\n",
    "        train_f_score_history.append(train_scores[2])\n",
    "            \n",
    "        print(\"Total training loss:\", total_loss)\n",
    "        print(\"Training performance:\", train_scores)\n",
    "        \n",
    "        total_loss = 0\n",
    "        predictions, correct = [], []\n",
    "        for batch in dev_iter:\n",
    "\n",
    "            text_length, cur_batch_size = batch.text[0].shape\n",
    "\n",
    "            pred = model(batch.text[0].to(device), batch.text[1].to(device)).view(cur_batch_size * text_length, NUM_CLASSES)\n",
    "            gold = batch.labels.to(device).view(cur_batch_size * text_length)\n",
    "            loss = criterion(pred, gold)\n",
    "            total_loss += loss.item()\n",
    "\n",
    "            _, pred_indices = torch.max(pred, 1)\n",
    "            predicted_labels = list(pred_indices.cpu().numpy())\n",
    "            correct_labels = list(batch.labels.view(cur_batch_size*text_length).numpy())\n",
    "            \n",
    "            predicted_labels, correct_labels = remove_predictions_for_masked_items(predicted_labels, \n",
    "                                                                                   correct_labels)\n",
    "            \n",
    "            predictions += predicted_labels\n",
    "            correct += correct_labels\n",
    "\n",
    "        dev_scores = precision_recall_fscore_support(correct, predictions, average=\"micro\")\n",
    "            \n",
    "        print(\"Total development loss:\", total_loss)\n",
    "        print(\"Development performance:\", dev_scores)\n",
    "        \n",
    "        dev_f = dev_scores[2]\n",
    "        if len(dev_f_score_history) > patience and dev_f < max(dev_f_score_history):\n",
    "            no_improvement += 1\n",
    "\n",
    "        elif len(dev_f_score_history) == 0 or dev_f > max(dev_f_score_history):\n",
    "            print(\"Saving model.\")\n",
    "            torch.save(model, output_path)\n",
    "            no_improvement = 0\n",
    "            \n",
    "        if no_improvement > patience:\n",
    "            print(\"Development F-score does not improve anymore. Stop training.\")\n",
    "            dev_f_score_history.append(dev_f)\n",
    "            break\n",
    "            \n",
    "        dev_f_score_history.append(dev_f)\n",
    "        \n",
    "    return train_f_score_history, dev_f_score_history"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When we test the model, we basically take the same steps as in the evaluation on the development data above: we get the predictions, remove the masked items and print a classification report. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "def test(model, test_iter, batch_size, labels, target_names): \n",
    "    \n",
    "    total_loss = 0\n",
    "    predictions, correct = [], []\n",
    "    for batch in test_iter:\n",
    "\n",
    "        text_length, cur_batch_size = batch.text[0].shape\n",
    "\n",
    "        pred = model(batch.text[0].to(device), batch.text[1].to(device)).view(cur_batch_size * text_length, NUM_CLASSES)\n",
    "        gold = batch.labels.to(device).view(cur_batch_size * text_length)\n",
    "\n",
    "        _, pred_indices = torch.max(pred, 1)\n",
    "        predicted_labels = list(pred_indices.cpu().numpy())\n",
    "        correct_labels = list(batch.labels.view(cur_batch_size*text_length).numpy())\n",
    "\n",
    "        predicted_labels, correct_labels = remove_predictions_for_masked_items(predicted_labels, \n",
    "                                                                               correct_labels)\n",
    "\n",
    "        predictions += predicted_labels\n",
    "        correct += correct_labels\n",
    "    \n",
    "    print(classification_report(correct, predictions, labels=labels, target_names=target_names))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can start the actual training. We set the embedding dimension to 300 (the dimensionality of the FastText embeddings), and pick a hidden dimensionality for each component of the BiLSTM (which will therefore output 512-dimensional vectors). The number of classes (the length of the vocabulary of the label field) will become the dimensionality of the output layer. Finally, we also compute the number of batches in an epoch, so that we can show a progress bar."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ba140815a7334dd28cbf7296dc7cc749",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 0', max=494, style=ProgressStyle(description_width='ini…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 222.35497660934925\n",
      "Training performance: (0.9244241132231894, 0.9244241132231894, 0.9244241132231894, None)\n",
      "Total development loss: 24.117380052804947\n",
      "Development performance: (0.9296043728606681, 0.9296043728606681, 0.9296043728606681, None)\n",
      "Saving model.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/anaconda3/lib/python3.7/site-packages/torch/serialization.py:251: UserWarning: Couldn't retrieve source code for container of type BiLSTMTagger. It won't be checked for correctness upon loading.\n",
      "  \"type \" + obj.__name__ + \". It won't be checked \"\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "e1b4716ec9dd4d3dbe903309888ee6a5",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 1', max=494, style=ProgressStyle(description_width='ini…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 71.356663974002\n",
      "Training performance: (0.9591253627050393, 0.9591253627050393, 0.9591253627050393, None)\n",
      "Total development loss: 18.751499708741903\n",
      "Development performance: (0.9388913949107119, 0.9388913949107119, 0.9388913949107119, None)\n",
      "Saving model.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b88bc0032ec148ffb3e787d5b152b0eb",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 2', max=494, style=ProgressStyle(description_width='ini…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 50.72972834017128\n",
      "Training performance: (0.9680819565346124, 0.9680819565346124, 0.9680819565346124, None)\n",
      "Total development loss: 17.262520626187325\n",
      "Development performance: (0.9425531350333006, 0.9425531350333006, 0.9425531350333006, None)\n",
      "Saving model.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "4f170531428d428688eeca047775ebf4",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 3', max=494, style=ProgressStyle(description_width='ini…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 41.36486494448036\n",
      "Training performance: (0.9728291980024082, 0.9728291980024082, 0.9728291980024082, None)\n",
      "Total development loss: 16.504801526665688\n",
      "Development performance: (0.9443309363971661, 0.9443309363971661, 0.9443309363971661, None)\n",
      "Saving model.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "1b679486eaf3421bbf20686375c8c41b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 4', max=494, style=ProgressStyle(description_width='ini…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 35.585738136898726\n",
      "Training performance: (0.9755877302066679, 0.9755877302066679, 0.9755877302066679, None)\n",
      "Total development loss: 16.711221787147224\n",
      "Development performance: (0.9458964629713164, 0.9458964629713164, 0.9458964629713164, None)\n",
      "Saving model.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "041d9f3c52464656bae1af40ac9960b8",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 5', max=494, style=ProgressStyle(description_width='ini…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 30.99800857482478\n",
      "Training performance: (0.9784597619470599, 0.9784597619470599, 0.9784597619470599, None)\n",
      "Total development loss: 16.387646575458348\n",
      "Development performance: (0.9469578369198928, 0.9469578369198928, 0.9469578369198928, None)\n",
      "Saving model.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "eb16a30222b04d49a285674d79abe5a0",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 6', max=494, style=ProgressStyle(description_width='ini…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 27.11286850180477\n",
      "Training performance: (0.9801326464144016, 0.9801326464144016, 0.9801326464144016, None)\n",
      "Total development loss: 17.557277165353298\n",
      "Development performance: (0.9460822034123172, 0.9460822034123172, 0.9460822034123172, None)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b8bc31674a8c4d13bc3e54164f85c98e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 7', max=494, style=ProgressStyle(description_width='ini…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 24.422367892228067\n",
      "Training performance: (0.9820917471032945, 0.9820917471032945, 0.9820917471032945, None)\n",
      "Total development loss: 16.09378209663555\n",
      "Development performance: (0.9491867222119033, 0.9491867222119033, 0.9491867222119033, None)\n",
      "Saving model.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "5cc4cc80f7fd453ca8cdc695b90f37bd",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 8', max=494, style=ProgressStyle(description_width='ini…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 22.412145966896787\n",
      "Training performance: (0.9829355914806261, 0.9829355914806261, 0.9829355914806261, None)\n",
      "Total development loss: 19.413369961082935\n",
      "Development performance: (0.9440390585613077, 0.9440390585613077, 0.9440390585613077, None)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "16a0f3a493274d578645e7801e49ea6b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 9', max=494, style=ProgressStyle(description_width='ini…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 20.09223960270174\n",
      "Training performance: (0.9845344545113598, 0.9845344545113598, 0.9845344545113598, None)\n",
      "Total development loss: 15.523855119943619\n",
      "Development performance: (0.948072279565898, 0.948072279565898, 0.948072279565898, None)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9282699e0d6a41f6a5c479f3e4b324c8",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 10', max=494, style=ProgressStyle(description_width='in…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 17.925453061121516\n",
      "Training performance: (0.9860395570557233, 0.9860395570557233, 0.9860395570557233, None)\n",
      "Total development loss: 16.789274506270885\n",
      "Development performance: (0.9485233634940431, 0.9485233634940431, 0.9485233634940432, None)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0dd2a7a78ddb40c5b2bc4f5211866565",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 11', max=494, style=ProgressStyle(description_width='in…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 16.409397734270897\n",
      "Training performance: (0.9867205542725174, 0.9867205542725174, 0.9867205542725174, None)\n",
      "Total development loss: 14.920704647898674\n",
      "Development performance: (0.950195027463051, 0.950195027463051, 0.950195027463051, None)\n",
      "Saving model.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "4f8b52d4449940b6bafb77e162872878",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 12', max=494, style=ProgressStyle(description_width='in…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 14.990085303143132\n",
      "Training performance: (0.9879493101202108, 0.9879493101202108, 0.9879493101202108, None)\n",
      "Total development loss: 17.077377565205097\n",
      "Development performance: (0.949876615278478, 0.949876615278478, 0.949876615278478, None)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "446456c24c1443809e9a472e93c8bc21",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 13', max=494, style=ProgressStyle(description_width='in…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 13.492025993036805\n",
      "Training performance: (0.9886845897238506, 0.9886845897238506, 0.9886845897238506, None)\n",
      "Total development loss: 19.90118957636878\n",
      "Development performance: (0.9481253482633268, 0.9481253482633268, 0.9481253482633268, None)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b37c259297374650955ab0b45cf71ed9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 14', max=494, style=ProgressStyle(description_width='in…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 12.34741208187188\n",
      "Training performance: (0.9893507826533231, 0.9893507826533231, 0.9893507826533231, None)\n",
      "Total development loss: 18.731272239238024\n",
      "Development performance: (0.9497970122323347, 0.9497970122323347, 0.9497970122323347, None)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d2796b99700548169493af184b534c8b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Epoch 15', max=494, style=ProgressStyle(description_width='in…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Total training loss: 10.979540771790198\n",
      "Training performance: (0.9901650184560116, 0.9901650184560116, 0.9901650184560116, None)\n",
      "Total development loss: 17.838901833223645\n",
      "Development performance: (0.949558203093905, 0.949558203093905, 0.949558203093905, None)\n",
      "Development F-score does not improve anymore. Stop training.\n"
     ]
    }
   ],
   "source": [
    "import math\n",
    "\n",
    "EMBEDDING_DIM = 300\n",
    "HIDDEN_DIM = 256\n",
    "NUM_CLASSES = len(label_field.vocab)\n",
    "MAX_EPOCHS = 50\n",
    "PATIENCE = 3\n",
    "OUTPUT_PATH = \"/tmp/bilstmtagger\"\n",
    "num_batches = math.ceil(len(train_data) / BATCH_SIZE)\n",
    "\n",
    "tagger = BiLSTMTagger(EMBEDDING_DIM, HIDDEN_DIM, VOCAB_SIZE+2, NUM_CLASSES, embeddings=embedding_matrix)  \n",
    "\n",
    "train_f, dev_f = train(tagger.to(device), train_iter, dev_iter, BATCH_SIZE, MAX_EPOCHS, \n",
    "                       num_batches, PATIENCE, OUTPUT_PATH)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now plot the evolution of the F-score on our training and development set, to visually evaluate if training went well. If it did, the training F-score should first increase suddenly, then more gradually. The development F-score will increase during the first few epochs, but at some point it will start to decrease again. That's when the model starts overfitting. This is where we abandon training, and why we only save the model when we have reached an optimal F-score on the development data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD8CAYAAACb4nSYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzt3Xd4VGXax/HvnZAAMYGEgKEEkoiIsNIDSpOirsC6Fmzoylp2F11Q0X15XWxbXBWsq6u4roXVfS2o2HDttEgXkCpdAiQBJfQSQtr9/nEmyWQSyACTnMnM/bmuc+XMKTP3BPKbZ55zznNEVTHGGBMeItwuwBhjTO2x0DfGmDBioW+MMWHEQt8YY8KIhb4xxoQRC31jjAkjFvrGGBNGLPSNMSaMWOgbY0wYqed2Ab6aNm2qqampbpdhjDF1ytKlS3eparPqtgu60E9NTWXJkiVul2GMMXWKiGz1Zzvr3jHGmDBioW+MMWHEQt8YY8JI0PXpV6WwsJDs7Gzy8/PdLsV1DRo0IDk5maioKLdLMcbUQXUi9LOzs4mLiyM1NRURcbsc16gqu3fvJjs7m7S0NLfLMcbUQX5174jIEBFZLyKbRGR8FetTRGSGiKwUkdkikuy17jERWe2Zrj2ZIvPz80lMTAzrwAcQERITE+0bjzHmpFUb+iISCUwChgIdgetEpKPPZk8C/1HVzsBDwATPvr8AugNdgXOBcSLS6GQKDffAL2W/B2PMqfCne6cXsElVNwOIyBTgMmCN1zYdgT945mcBH3kt/0ZVi4AiEVkJDAHeDUDtxhhT55SUwE8/wdatlafcXFi4EGqybedP6LcCsrweZ+O02r2tAIYDzwJXAHEikuhZ/mcReQqIAQZR8cPCGGNCSkEBZGdXHerbtjlTQcGx99+7F5o0qbn6AnUgdxzwvIjcBHwD5ADFqvqViPQE5gO5wAKg2HdnERkFjAJo06ZNgEoKrH379vHWW28xevToE9pv2LBhvPXWW8THx5/QfuvWrWPEiBGICFOnTqVt27YntL8xJvCKi53W+I4dsH27E+Degb51q7Nc9fjP07QppKQ4U5s25fMpKRAXV7PvwZ/QzwFaez1O9iwro6rbcVr6iEgscKWq7vOsewR4xLPuLWCD7wuo6kvASwDp6enV/LrcsW/fPl544YVKoV9UVES9esf+NX722Wcn9XofffQRV111FQ888MBJ7W+M8V9hodPlsmNH+bR9e+XHO3c6wX88ERGQnFw5zL1D/rTTaud9VcWf0F8MtBORNJywHwFc772BiDQF9qhqCXAvMNmzPBKIV9XdItIZ6Ax8dSoF11RfV3WfzOPHj+eHH36ga9euREVF0aBBAxISEli3bh0bNmzg8ssvJysri/z8fMaOHcuoUaOA8rGEDh06xNChQ+nXrx/z58+nVatWfPzxxzRs2LDSa3322Wc888wzREZGMmPGDGbNmlUTb9mYkFdc7IR5djbk5FQMcu/53NzqM6BU06bQooUztW5dOdRbtYJgvoym2tBX1SIRuR34EogEJqvq9yLyELBEVacBA4EJIqI43TtjPLtHAXM8Z5wcAG7wHNStcyZOnMjq1atZvnw5s2fP5he/+AWrV68uO19+8uTJNGnShCNHjtCzZ0+uvPJKEhMTKzzHxo0befvtt3n55Ze55ppreP/997nhhhsqvdawYcO47bbbiI2NZdy4cbXy/oypawoKnMDOzq445eSUz2/fXn3LHJzGZFKSE+QtW5aHuu/j5s0hOrrm31tN8qtPX1U/Az7zWfYnr/mpwNQq9svHOYMnYPz9NK5pvXr1qnCB1D/+8Q8+/PBDALKysti4cWOl0E9LS6Nr164A9OjRgy1bttRavcbUJQUFTh+5b6B7h/pPP/mXB82aOd0trVqVB7hvsCclwXF6aUNKmLzNwDvNq1Nu9uzZTJ8+nQULFhATE8PAgQOrvICqfv36ZfORkZEcOXKkVmo1Jpjt3AkrV8KKFc60ciWsWeP0sx9PRIQT3q1aOaFe1dSyJXj92Rks9P0WFxfHwYMHq1y3f/9+EhISiImJYd26dSxcuLCWqzMm+BUWwrp15QFf+vPHH6vevk0bZyptpfsGevPm4dM6DyT7lfkpMTGRvn37cs4559CwYUOSkpLK1g0ZMoQXX3yRDh060L59e8477zwXKzXGfbm5Vbfeqzo/PTYWOneGLl2cqXNn6NTJWW4CTzRYOsk90tPT1ffOWWvXrqVDhw4uVRR87Pdh3KYK+/Y5B0q3b3f62deuLQ/4HTuq3q9t28oBn5rqdNWYUyMiS1U1vbrtrKVvjCmjCgcOlId56WmN3o9Lp6NHj/08pa330oAvbb3X9IVHpnoW+i4bM2YM8+bNq7Bs7Nix3HzzzS5VZEKZKvzwAyxdCllZVYd7Xp5/z9WoUcWzYc48szzg09Ks9R6sLPRdNmnSJLdLMCHs8GFYvBgWLHCmhQud/vbjiYkpP73ReyoN99J563Ovmyz0jQkRqpCZWR7wCxY4fey+Fyc1awbnnef0r1cV7HFxNTvKo3GXhb4xdVReHixZUjHkd+6suE1kJHTrBr17l09nnGGhHs4s9I2pA1SdERy9A375cijyGdSkadOKAd+zp7uDe5ngY6FvTBAqKnK6ZubOhTlzYN68yhcxRUQ4B069Q/7MM60Vb47PQv8k/eUvfwnYgGg2dr45cgQWLXICfu5cmD8fDh2quE2TJpVb8XYKpDlRFvpBwMbODz979jit9zlznGnp0spjzbRtC/36Qf/+0LcvtG9vrXhz6upc6Mtfa+Z/vf65+iuTH3nkEV5//XVOP/10WrduTY8ePfjhhx8YM2YMubm5xMTE8PLLL9OiRQs6d+5MZmYmERERHD58mLPPPpvNmzcT5TPQto2dHx62bStvxc+ZA99/X3G9CHTt6gR8v37O1LKlO7Wa0FbnQt8tS5cuZcqUKSxfvpyioiK6d+9Ojx49GDVqFC+++CLt2rVj0aJFjB49mpkzZ9K1a1cyMjIYNGgQ//3vf7n44osrBT7Y2PmhqKTEGZKgtBU/d64T+t7q14dzzy1vyffuDY0bu1OvCS91LvT9aZHXhDlz5nDFFVcQExMDwKWXXkp+fj7z58/n6quvLtvuqOfa9GuvvZZ33nmHQYMGMWXKlBO+t64JXqrOqZFbthx78h1ZOz7e6aIpbcmnp9uQv8YddS70g0lJSQnx8fEsX7680rpLL72U++67jz179rB06VIGDx7sQoXmZJxMqPtKTi5vxffvDz/7mQ1LYIKDhb6fzj//fG666SbuvfdeioqK+OSTT7j11ltJS0vjvffe4+qrr0ZVWblyJV26dCE2NpaePXsyduxYLrnkEiIjI91+C8aHKqxeDV9/DRs3nlioJyY6o0NWNaWk2Fk1JnhZ6Pupe/fuXHvttXTp0oXTTz+dnj17AvDmm2/y+9//nocffpjCwkJGjBhBly5dAKeL5+qrr2b27NkuVm68HToEM2fCZ585U1ZW1dtZqJtQZePp10H2+zgxGzeWh/zs2RVv5JGUBEOHQo8eFuqmbrPx9E3YOnoUMjLKg37jxvJ1Is5gY8OGwS9+4ZwmaX3tJpz4FfoiMgR4FogEXlHViT7rU4DJQDNgD3CDqmZ71j0O/AKIAL4Gxmqwfb2oJTZ2fs3JyioP+enTK44Jn5AAQ4Y4QT9kiDM+jTHhqtrQF5FIYBJwEZANLBaRaaq6xmuzJ4H/qOrrIjIYmACMFJE+QF+gs2e7ucAAYPaJFqqqSB2/HDEQY+eH6edlJUVFzqBjn37qBP2qVRXXd+3qhPywYc758HYDbWMc/vwp9AI2qepmABGZAlwGeId+R+APnvlZwEeeeQUaANGAAFHATydaZIMGDdi9ezeJiYl1PvhPhaqye/duGjRo4HYptU7V6abJyHBa8l995dyjtVRsLFx0kRPyQ4c6NwExxlTmT+i3ArzPccgGzvXZZgUwHKcL6AogTkQSVXWBiMwCduCE/vOquvZEi0xOTiY7O5vc6m75EwYaNGhAcnKy22XUOFVYv9458JqR4Uy+N9tu397plx82zDkn3i52MqZ6gfrSOw54XkRuAr4BcoBiETkT6ACUptTXItJfVed47ywio4BRAG3atKn05FFRUaSlpQWoVBOMVJ2hC7xD/ief74TNmsHAgTBggNM3b4ORGnPi/An9HKC11+Nkz7Iyqrodp6WPiMQCV6rqPhH5HbBQVQ951n0O9Abm+Oz/EvASOKdsntxbMXVJSYkz6FhGhhP033xT+d6tzZs7AV8a9GefbaNMGnOq/An9xUA7EUnDCfsRwPXeG4hIU2CPqpYA9+KcyQOwDfidiEzA6d4ZADwToNpNHVJS4hxs9Q753bsrbtOyZcWQP+ssC3ljAq3a0FfVIhG5HfgS55TNyar6vYg8BCxR1WnAQGCCiChO984Yz+5TgcHAKpyDul+o6ieBfxsmGB05Am+/DdOmOSG/d2/F9cnJFUPe7vpkTM2rE1fkmrpl61Z44QV45RXnZiGl2rQpD/iBAyEtzULemECxK3JNrVKFWbPgueecln1JibM8PR1GjXJOp0xNdbVEYwwW+uYUHToEb7zhhP0az5UbUVFw3XVwxx3OhVHGmOBhoW9OyqZNMGkS/PvfsH+/s6xFC7jtNqdl37y5u/UZY6pmoW/8VlLiXAn73HPw+edOlw5Anz5Oq374cIiOdrdGY8zxWeibah04AK+95rTsN2xwltWvX96F0727q+UZY06Ahb45pnXr4Pnn4fXXnb57cE6zHD0afvtb5wpZY0zdYqFvKigudkatfO455zaCpQYMcFr1l11mI1YaU5fZn68BnCEQJk+Gf/0LMjOdZQ0bwg03wO23Q+fOx9/fGFM3WOiHMVWYOxf++U94//3y2wimpsKYMXDLLdCkiaslGmMCzEI/DO3fD//3f/Dii86gZ+BcGXvJJc4pl0OGQGSkuzUaY2qGhX4Y+e47J+jfegsOH3aWnX66c1B21CjnhuDGmNBmoR/ijhyBd95xunC+/bZ8+aBBTqv+8svt3HpjwomFfohav95p1b/2WvltBePj4cYbnbA/+2xXyzPGuMRCP4QUFsJHHzmt+lmzypf37Am//z1cey3ExLhXnzHGfRb6IWDbNnjpJXj1VfjxR2dZTAxcf73Tqu/Rw936jDHBw0K/Dlu7FsaPh//+t3wo444dnVb9yJHQuLG79Rljgo+Ffh2k6oyD87//C/n5zlDG117rtOr797cbkxhjjs1Cv47ZscO5aOqLL5zHN90Ejz3mnHppjDHVsdCvQz78EH73O+eG4k2aOEMmXHWV21UZY+qSCLcLMNU7dMi5gGr4cCfwL7oIVq60wDfGnDhr6Qe5hQudQc9++MEZw/6xx5zRLiPs49oYcxL8ig4RGSIi60Vkk4iMr2J9iojMEJGVIjJbRJI9yweJyHKvKV9ELg/0mwhFRUXw179Cv35O4HfuDEuWwNixFvjGmJNXbXyISCQwCRgKdASuE5GOPps9CfxHVTsDDwETAFR1lqp2VdWuwGAgD/gqgPWHpE2bnLD/y1+cUzHHjXOGUDjnHLcrM8bUdf60GXsBm1R1s6oWAFOAy3y26QjM9MzPqmI9wFXA56qad7LFhjpV5wKrrl1h0SLnLlXTp8MTTzhdO8YYc6r8Cf1WQJbX42zPMm8rgOGe+SuAOBFJ9NlmBPD2yRQZDnbtcg7U/va3zgiY117rHKwdPNjtyowxoSRQvcPjgAEisgwYAOQAxaUrRaQF0An4sqqdRWSUiCwRkSW5ubkBKqnu+OIL6NTJGTenUSN44w14+21ISHC7MmNMqPEn9HOA1l6Pkz3LyqjqdlUdrqrdgPs9y/Z5bXIN8KGqFlb1Aqr6kqqmq2p6szC62/aRI86ZOEOHOmPmnH++07r/1a/sqlpjTM3wJ/QXA+1EJE1EonG6aaZ5byAiTUWk9LnuBSb7PMd1WNdOBcuWOQOhPf+8M4zCxIkwc6bdyMQYU7OqDX1VLQJux+maWQu8q6rfi8hDInKpZ7OBwHoR2QAkAY+U7i8iqTjfFDICWnkdVVzsnGt/7rnOgGlnn+2ci//HP9otCo0xNU9U1e0aKkhPT9clS5a4XUaNyMpyRr/M8Hz8jRkDjz9uY9wbY06diCxV1fTqtrMrcmvJwYNw4YWwYQMkJcHkyTBsmNtVGWPCjYV+LVCFW291Ar9TJ5gxA8LoeLUxJojYBf214KWXnFMwY2Phvfcs8I0x7rHQr2HLljnj5YAT/u3bu1uPMSa8WejXoAMH4Jpr4OhRGDUKrrvO7YqMMeHOQr+GqDpDKmzaBF26wDPPuF2RMcZY6NeYF15w+u/j4pyfDRu6XZExxljo14ilS+EPf3DmX3kF2rVztx5jjClloR9g+/c7/fgFBTB6tDNvjDHBwkI/gFThlltg82bo3h2eftrtiowxpiIL/QB67jn44ANneOR337Ubnxhjgo+FfoB8+61zW0Nwhlho29bdeowxpioW+gGwd6/Td19Y6IyPf+WVbldkjDFVs9A/Rapw882wdSv07Oncz9YYY4KVhf4p+vvf4eOPIT4e3nnH+vGNMcHNQv8UlN78BODf/4a0NHfrMcaY6ljon6Tdu51+/KIiuPtuuPxytysyxpjqWeifhJISuPFG505Y557r3N/WGGPqAgv9k/DUU/Dpp5CQ4PTjR0e7XZExxvjHQv8EzZsH997rzL/+OqSkuFuPMcacCAv9E7BrF1x7LRQXOxdi/fKXbldkjDEnxq/QF5EhIrJeRDaJyPgq1qeIyAwRWSkis0Uk2WtdGxH5SkTWisgaEUkNXPm1p6QERo6EnBzo0wcefdTtiowx5sRVG/oiEglMAoYCHYHrRKSjz2ZPAv9R1c7AQ8AEr3X/AZ5Q1Q5AL2BnIAqvbY89Bl98AYmJMGUKREW5XZExxpw4f1r6vYBNqrpZVQuAKcBlPtt0BGZ65meVrvd8ONRT1a8BVPWQquYFpPJa9M038MADzvz//R+0bu1uPcYYc7L8Cf1WQJbX42zPMm8rgOGe+SuAOBFJBM4C9onIByKyTESe8HxzqDN27oQRI5zunfHjYehQtysyxpiTF6gDueOAASKyDBgA5ADFQD2gv2d9T+AM4CbfnUVklIgsEZElubm5ASrp1BUXww03wI4d0K8f/O1vbldkjDGnxp/QzwG8OzSSPcvKqOp2VR2uqt2A+z3L9uF8K1ju6RoqAj4Cuvu+gKq+pKrpqprerFmzk3wrgffoo/D119C0qdOPX6+e2xUZY8yp8Sf0FwPtRCRNRKKBEcA07w1EpKmIlD7XvcBkr33jRaQ0yQcDa0697Jp34EB5y/6NN6CVb4eWMcbUQdWGvqeFfjvwJbAWeFdVvxeRh0TkUs9mA4H1IrIBSAIe8exbjNO1M0NEVgECvBzwd1EDli93xsfv0QMuvtjtaowxJjD86rBQ1c+Az3yW/clrfiow9Rj7fg10PoUaXfHdd87Pbt3crcMYYwLJrsg9htLQ717pCIQxxtRdFvrHYKFvjAlFFvpVyMuDtWshMhI617mOKWOMOTYL/SqsWuVcjNWhAzRs6HY1xhgTOBb6VbCDuMaYUGWhXwXrzzfGhCoL/SosW+b8tNA3xoQaC30fBQVOnz5A167u1mKMMYFmoe9jzRon+Nu1g0aN3K7GGGMCy0Lfhx3ENcaEMgt9H3YQ1xgTyiz0fdhBXGNMKLPQ91Jc7IyuCda9Y4wJTRb6XjZscIZgaNPGuXGKMcaEGgt9L3YQ1xgT6iz0vVh/vjEm1Fnoe7Ezd4wxoc5C30PVQt8YE/os9D0yM2H/fkhKghYt3K7GGGNqhoW+h/dBXBF3azHGmJpioe9hB3GNMeHAr9AXkSEisl5ENonI+CrWp4jIDBFZKSKzRSTZa12xiCz3TNMCWXwgWX++MSYc1KtuAxGJBCYBFwHZwGIRmaaqa7w2exL4j6q+LiKDgQnASM+6I6oa1IMUq8LSpc68hb4xJpRVG/pAL2CTqm4GEJEpwGWAd+h3BP7gmZ8FfBTIImva9u2Qmwvx8ZCa6nY1xtQdhwsOsyB7AXO2ziGvMI/Y6Fi/p+jIaMQOoNU6f0K/FZDl9TgbONdnmxXAcOBZ4AogTkQSVXU30EBElgBFwERVDboPhNL+fDuIa8zxHSo4xLxt88jYmsHsLbNZvH0xRSVFJ/Vc9SLqERcdd8wPhRaxLRiYOpDzU86ncYPGAX4npybnQA4LshdQUFxAw3oNaRjV8Lg/G9RrEDQfcP6Evj/GAc+LyE3AN0AOUOxZl6KqOSJyBjBTRFap6g/eO4vIKGAUQJs2bQJUkv+sP9+Yqh04eoB52+Yxe8tsMrZmsGT7Eoq1uGx9hETQo0UPBqQMoHlscw4VHCqfCsvnDx49WGHdwYKDFJUUsTd/L3vz9x7z9Z9e+DQREkHPlj0ZnDaYwWmD6du6Lw2jGtbG2y+TtT+r7IMuY2sGm/ZsOuHnaFCvQdkHQUxUzDE/JF755Ss1+v78Cf0coLXX42TPsjKquh2npY+IxAJXquo+z7ocz8/NIjIb6Ab84LP/S8BLAOnp6Xoyb+RUWOiHr6z9Wby16i0uPONCerTs4XY5rtufv5852+aQsSWDjK0ZLN2xlBItKVsfKZH0bNmTgakDGZAygH5t+p10K7yguKDih4TPtG7XOmZmzmRRzqKyacLcCURHRtOndR8uSLuAwWmD6dmyJ1GRUYH6FQCwdd/WsoDP2JrB5r2bK6yPjY6lb+u+JDRM4EjhEY4UHTnuz6PFR8kvyie/KP+4H3IAky+dHND34ktUj5+xIlIP2ABcgBP2i4HrVfV7r22aAntUtUREHgGKVfVPIpIA5KnqUc82C4DLfA4CV5Cenq5Lliw55Td2Itq0gawsWLsWzj67Vl/auGTVT6t4Yv4TvL36bYpKioiKiOKJi57gznPvDJqv4bVh75G9ZSE/e+tslv+4vELI14uoR3rLdAakDGBg6kD6tu5LXP24Wq3xUMEh5mydw8zMmczInMHyH5ejlOdWbHQs56ecz+DUwVxwxgV0TupMhPh/NrqqsmXflrKQn71lNlv3b62wTaP6jejfpn/Z76Fbi27Ui/C/o6RES8gvyq/2AyKvMI9fdfrVSf0fFJGlqppe7XbVhb7nyYYBzwCRwGRVfUREHgKWqOo0EbkK54wdxeneGeMJ+j7Av4ASnNNDn1HVV4/3WrUd+rt2QbNmEBMDBw5AZGStvbSpZapKxtYMHp/3OJ9v+hxwuif6tO7D3G1zAbiq41W8eumrNKpft2+QXFRSxL78few5soe9R5wuFO+fOw7tYF7WPFb8uKJCgEZFRNGzVU8GpgxkQOoA+rTuQ2x0rIvvpLLdebvJ2JrBjM0zmLllJut2rauwvknDJgxKHVT2TeCsxLMqhKiq8sPeH8o+6DK2ZJB1IKvCc8Q3iKd/m/5l32i6Nu9KZERwh0NAQ7821Xbof/01/Pzn0KcPzJtXay9ralFxSTEfrP2AJ+Y/weLtiwGIiYrhN91+w93n3U1aQhpT10zllo9v4WDBQdo1acfUa6bSOamzy5U7CooLWPnTSnbn7a4U3nuO7CnrF/defrDgoF/PHRURxbnJ55aFfO/k3pwWfVoNv6PAyjmQw6wts8q+CWzbv63C+lZxrRicNpguSV347sfvyNiSQc7BCj3UJDRIYEDqAAakOFPnpM5BH/K+LPT99NhjMH483H47PPdcrb2sqQVHCo/w2vLXeGrBU/yw1zmM1DSmKXf2upPRPUeTGJNYYfuNuzdy1XtXsfKnlTSo14BJwyZxS7db3Ci9zKcbPmXsF2PL6veXIMQ3iCehYQIJDRLKf3rmExsmkt4ynfOSz6v1g6I1SVXZvHczMzJnMDNzJjMzZ5Kbl1tpu8SGiWUhPzB1IOecfs4JdQkFI39DP1Bn79RZdhA39OzO280Li1/guW+fK/uDPyPhDMb1HseNXW8kJiqmyv3aJbZj4W8WcvtntzN5+WR+M+03zN02l+eHPX/MfWpK5t5M7vryLqatdy5iT2mcQrvEdhWCu6qfTRo2IaFhAo3qN6rzIXYyRIS2TdrStklbRvUYRYmW8P3O75mZOZPVO1fTtXlXBqQOoGOzjmH5+wFr6dOuHWza5Nwbt0uXWntZUwO27NvC0wue5tVlr5JXmAdAest07ulzD8M7DD+hr+uvLX+N0Z+O5kjRETqd3omp10zlrMSzaqr0MkcKj/D4vMeZOG8i+UX5xEXH8ZeBf+GOXncE/AwVE1qse8cP+/c7V+FGR8OhQxBlf1OnpERL2LpvK+t2rSub1u5aS15hHqnxqaTFp5GWkEZafBpnJJxBSnwKDeo1OOXXXbZjGU/Mf4J3v3+37BzyoWcO5Z6+9zAgZcBJn42z6qdVXPnulWzcs5G46DhevfRVrv7Z1adc77F8sv4Txn4xlsx9mQD8qtOveOKiJ2gRZ2N9m+pZ944fVqxwfnbqZIF/Io4UHmHD7g1loV4a8Ot3rye/KL/KfZbuWFrl8pZxLcs+BHw/FFrGtTxm61xVmb55Oo/Pf5zpm6cDzumFIzuNZFyfcQE5CNspqRNLRi3ht9N+y3tr3uOaqddw57Y7eeLnTxAdGX3Kz19q055N3PXFXXy68VPndU/vxPPDnuf8lPMD9hrGlArr0Lf+/GNTVXbl7arQYi+d37JvS4XT/Ly1iG3B2U3PLps6NO1AbHQsW/ZtIXNfJpv3biZzXyaZezPZtn8b2w9uZ/vB7czLqnzqVFREFCnxKc6HQekHQ0Ia+UX5/H3h31n+43LAOU97VPdR3HXeXbRu3LrS85yKRvUb8c5V79D/2/78z1f/wz++/QeLchbx7tXv0qbxqV09nleYx4Q5E3h8/uMUFBfQqH4jHhr4EGN6jTmhc8CNORFh/T/LQt9ResbDvKx5zM+az+qdq1m7ay17juypcvtIieTMJmeWhXppwLdv2p74BvFV7tO7de9Ky4pKisg+kO18EOzNdD4MPB8Im/du5qfDP7Fpz6ZjXvKedFoSY88dy23pt5HQMOHkfwHVEBHuOPcOerXqxTVTr2FRziK6/asbb1zxBkPbDT3h51NVPlr3EXd/eXfZRUC/7vJrHrvwMZrHNg90+cbYqjbvAAAQiklEQVRUENZ9+uecA99/D4sWQa9etfKSQaGwuJDlPy5n7ra5zMuax7ysefx46MdK28VFxznB3qwDZyeWt97bNmkb0O6NY8krzHO+IXg+EEq/JRw4eoARPxvByC4jA3JM4ETsztvNyA9Hll3cdX//+/nrwL/6fZB4w+4N3Pn5nXz5w5cAdEnqwqRhk+jbpm+N1WzCgx3IrUZeHsTFOaNqHjwIDUPnVOVK9ufvZ0H2AuZtcwJ+Uc6isrNbSiU2TKRfm370bd2X7i2606FZB1rEtgirIQn8VaIlTJw7kQdnPUiJljA4bTBvDX+LpNikY+5zuOAwj8x5hKcWPEVBcQGN6zfm4cEPc1v6bdaVYwLCDuRWY9UqKClxWvuhFPiqyrb925wW/LZ5zM2ay6qfVlXqgz8r8Sz6tu5bFvS+l6qbY4uQCO7rfx+9k3tz3fvXMTNzJt3+1Y0pV02pdPBVVXl/7fv84cs/lF3qf3PXm5l44UROP+10N8o3YS5sQz9U+vOLS4pZ+dNK5mXNK+uuyT6QXWGbqIgoerTsQb/W/ejbpi99WvexwAmAQWmDWHbrMka8P4Jvtn7D4NcH8+gFjzKuzzgiJIJ1u9Zxx+d3lJ1d1K15NyYNm1Tl8Q1jaouFfh0LfVVl9c7VzMicwfTN0/lm6zeVxlmJbxBP39Z9y1ry6S3TQ+pS+2DSIq4FM349gwdnPsjEeRP54/Q/MnfbXDo07cDfF/6dwpJCEhok8MjgRxjVY1SdG8/FhB4L/ToQ+lv3bWX65unMyJzBjMwZ7Dy8s8L6MxLOqNBV06FZh7C9xNwN9SLqMeHCCfRt05dff/hrPtnwCZ9s+ARB+G233zLhwgk0jWnqdpnGAGEa+gUFsHq1Mx+MQy/sytvFrMxZZa1538G2Wsa15IK0C7jwjAsZnDaY5EbJLlVqvF1y1iV8d+t33PLxLRSVFPHkz5+kV6swOi3M1AlhGfpr1jjB364dNAqCYdMPFxxm7ra5Za1535tENK7fmEFpg8qCvn1iezvoGqRS41OZeeNMt8sw5pjCMvTd7topLC5k8fbFzNg8g+mZ01mQtYDCksKy9fUj69O3TV8uTLuQC864gO4tuttpfcaYgAjLJHEj9FWVD9Z+wGsrXiNjS0aFg6+C0LNlz7KWfJ/WfezAqzGmRoRl6C9b5vzs1q12Xi9jSwb3TL+Hb3O+LVvWPrF9WcgPTB1Yo8MIGGNMqbAL/eJiZ+x8qPnQX71zNeOnjy8bPTHptCTu738/V3S4wg6+GmNcEXahv2GDMwRDmzbQtIbOoss+kM2fZv2J11e8TomWEBsdyz197uHu3ncH3U2mjTHhJexCvyb78/fl72Pi3Ik8u+hZ8ovyqRdRj9Hpo3lwwIN2BawxJij4FfoiMgR4FogEXlHViT7rU4DJQDNgD3CDqmZ7rW8ErAE+UtXbA1T7SamJ0D9adJRJiyfxyJxHyoYjvuZn1/DI4Ec4s8mZgXshY4w5RdWGvohEApOAi4BsYLGITFPVNV6bPQn8R1VfF5HBwARgpNf6vwHfBK7skxfIg7glWsKbK9/kwVkPlo2LPjB1II9f+Dg9W/U89RcwxpgA86el3wvYpKqbAURkCnAZTsu9VEfgD575WcBHpStEpAeQBHwBVDvsZ01SDUxLX1X56oev+OP0P7LiJ+eei+ecfg6PXfgYQ88cahdOGWOClj+h3wrI8nqcDZzrs80KYDhOF9AVQJyIJAJ7gaeAG4ALT7naU5SZ6dwMPSkJWpzkvaaXbl/KH6f/kRmZMwBIbpTM3wb9jZGdR9pgWsaYoBeoA7njgOdF5CacbpwcoBgYDXymqtnHa/2KyChgFECbNqd239Hj8W7ln2hjfPPezTww8wHeXv024AyNcF//+7ij1x12IZUxps7wJ/RzAO+7TSd7lpVR1e04LX1EJBa4UlX3iUhvoL+IjAZigWgROaSq4332fwl4CZw7Z53sm6lOaX/+iXTt5B7O5eFvHuafS/5JYUkh0ZHR3NHrDu7rfx9NGjapmUKNMaaG+BP6i4F2IpKGE/YjgOu9NxCRpsAeVS0B7sU5kwdV/ZXXNjcB6b6BX5tKW/r+HsR9duGzPDjrQQ4WHEQQRnYeyd8G/Y2U+JSaK9IYY2pQtaGvqkUicjvwJc4pm5NV9XsReQhYoqrTgIHABBFRnO6dMTVY80lRhaVLnXl/WvpTVk/hri/vAuDithfz2IWP0aV5EI7DbIwxJyBsboyekwPJyRAfD3v2HL9PP3NvJl3/1ZUDRw/w7JBnufPcOwNejzHGBJK/N0YPm9sr+XsQt7C4kOvev44DRw9wxdlXcEevO2qnQGOMqQVhE/r+HsT98+w/syhnEcmNknnl0lfsnHtjTEgJm9D35yDujM0zmDh3IhESwZvD37Szc4wxISfsQv9YLf3cw7mM/HAkivLg+Q9yfsr5tVecMcbUkrAI/V27ICsLTjvNuS+uL1Xl5o9vZsehHfRr048Hzn+g9os0xphaEBahX9qf37UrRFYxUsJz3z7Hpxs/Jb5BPG8Of9PuR2uMCVlhEfrH69pZ/uNy/vfr/wXglV++QpvGNTcMhDHGuC2sQt/3IO7hgsOMmDqCguICbu1xK1d2vLL2izPGmFoUVqHv29If+8VY1u9eT8dmHXn64qdrvzBjjKllIR/6+/fDpk0QHQ0dO5Yvf2f1O7y67FXqR9ZnypVTiImKca9IY4ypJSEf+iuce5zQuTNERTnzmXszGfXfUQA8ffHTdErq5FJ1xhhTu0I+9H378wuLC7n+g+s5cPQAl7W/jN+n/9694owxppaFTeiX9uf/NeOvLMxeSKu4Vrx66as2zIIxJqyEVejPypzFo3MeRRDeHP4miTGJ7hZnjDG1LKSvQsrLg7VrnQuyWrTdxXmv31A2zMKA1AFul2eMMbUupFv6q1ZBSQl06KiM+eoWth/cTp/WffjTgD+5XZoxxrgipEO/tGsnZsAkPtnwCY3rN+at4W/ZMAvGmLAV+qGftILvmo4D4OVfvmz3tzXGhLWQDv3FKw7DVSMo4ii/6/47rv7Z1W6XZIwxrgrZ0C8ogFUt74Zm62jfpAPPDHnG7ZKMMcZ1IRv6z379HiXdXkaK6/PO1W/bMAvGGIOfoS8iQ0RkvYhsEpHxVaxPEZEZIrJSRGaLSLLX8u9EZLmIfC8itwX6DVRl676t/Hnp7wDolvskXZp3qY2XNcaYoFdt6ItIJDAJGAp0BK4TkY4+mz0J/EdVOwMPARM8y3cAvVW1K3AuMF5EWgaq+KoUlRRx/QfXc0T3w/pfck3qmJp8OWOMqVP8aen3Ajap6mZVLQCmAJf5bNMRmOmZn1W6XlULVPWoZ3l9P1/vlDyU8RDzs+YTnd8SPp5Mjx42zIIxxpTyJ4RbAVlej7M9y7ytAIZ75q8A4kQkEUBEWovISs9zPKaq20+t5GObvWU2D3/zMIIgH70BeU0r3TjFGGPCWaBa3uOAASKyDBgA5ADFAKqa5en2ORO4UUSSfHcWkVEiskREluTm5p5UAbvzdnPDB84wC6M63MfRdYNISYFEG17HGGPK+BP6OUBrr8fJnmVlVHW7qg5X1W7A/Z5l+3y3AVYD/X1fQFVfUtV0VU1v1qzZCb4FxwuLXyDnYA69k3vTu+DPQOXbIxpjTLjzZzyCxUA7EUnDCfsRwPXeG4hIU2CPqpYA9wKTPcuTgd2qekREEoB+wN8DWH+Z+8+/n9joWK7ocAXP/c25W0pVN0I3xphwVm3oq2qRiNwOfAlEApNV9XsReQhYoqrTgIHABBFR4Bug9JSZDsBTnuUCPKmqq2rgfRAhEdzd+24Ali1zllnoG2NMRaKqbtdQQXp6ui5ZsuSk91eFhATn3rjbt0OLFgEszhhjgpSILFXV9Oq2C7krcjMzncBv3twC3xhjfIVc6PveE9cYY0y5kA196883xpjKQi707SCuMcYcW0iFviosXerMW+gbY0xlIRX627dDbq5z9k6K3SDLGGMqCanQ9z6IKzbOmjHGVBJSoW/9+cYYc3whFfp25o4xxhyfhb4xxoSRkAn93FzIyoLTToN27dyuxhhjglPIhH5pf37XrhARMu/KGGMCK2Ticd8+SEqyrh1jjDkef8bTrxOuucaZCgrcrsQYY4JXyLT0S0VHu12BMcYEr5ALfWOMMcdmoW+MMWHEQt8YY8KIhb4xxoQRC31jjAkjFvrGGBNGLPSNMSaMiKq6XUMFIpILbD2Fp2gK7ApQOTUh2OuD4K8x2OsDqzEQgr0+CK4aU1S1WXUbBV3onyoRWaKq6W7XcSzBXh8Ef43BXh9YjYEQ7PVB3ajRl3XvGGNMGLHQN8aYMBKKof+S2wVUI9jrg+CvMdjrA6sxEIK9PqgbNVYQcn36xhhjji0UW/rGGGOOIWRCX0SGiMh6EdkkIuPdrseXiLQWkVkiskZEvheRsW7XVBURiRSRZSLyX7drqYqIxIvIVBFZJyJrRaS32zV5E5G7Pf++q0XkbRFpEAQ1TRaRnSKy2mtZExH5WkQ2en4mBGGNT3j+nVeKyIciEh9sNXqt+x8RURFp6kZtJyIkQl9EIoFJwFCgI3CdiHR0t6pKioD/UdWOwHnAmCCsEWAssNbtIo7jWeALVT0b6EIQ1SoirYA7gXRVPQeIBEa4WxUArwFDfJaNB2aoajtghuexm16jco1fA+eoamdgA3BvbRfl4zUq14iItAZ+Dmyr7YJORkiEPtAL2KSqm1W1AJgCXOZyTRWo6g5V/c4zfxAnrFq5W1VFIpIM/AJ4xe1aqiIijYHzgVcBVLVAVfe5W1Ul9YCGIlIPiAG2u1wPqvoNsMdn8WXA657514HLa7UoH1XVqKpfqWqR5+FCILnWC6tYT1W/R4C/A/cAdeIAaaiEfisgy+txNkEWqN5EJBXoBixyt5JKnsH5z1vidiHHkAbkAv/2dEG9IiKnuV1UKVXNAZ7EafHtAPar6lfuVnVMSaq6wzP/I5DkZjF+uAX43O0ifInIZUCOqq5wuxZ/hUro1xkiEgu8D9ylqgfcrqeUiFwC7FTVpW7Xchz1gO7AP1W1G3AY97slynj6xS/D+XBqCZwmIje4W1X11DmFL2hbqSJyP0736Jtu1+JNRGKA+4A/uV3LiQiV0M8BWns9TvYsCyoiEoUT+G+q6gdu1+OjL3CpiGzB6R4bLCJvuFtSJdlAtqqWfkOaivMhECwuBDJVNVdVC4EPgD4u13QsP4lICwDPz50u11MlEbkJuAT4lQbf+eVtcT7gV3j+bpKB70SkuatVVSNUQn8x0E5E0kQkGufg2TSXa6pARASnL3qtqj7tdj2+VPVeVU1W1VSc399MVQ2qVqqq/ghkiUh7z6ILgDUuluRrG3CeiMR4/r0vIIgONPuYBtzomb8R+NjFWqokIkNwuhsvVdU8t+vxpaqrVPV0VU31/N1kA909/0+DVkiEvudgz+3Alzh/ZO+q6vfuVlVJX2AkTgt6uWca5nZRddAdwJsishLoCjzqcj1lPN9ApgLfAatw/r5cv2JTRN4GFgDtRSRbRH4DTAQuEpGNON9QJgZhjc8DccDXnr+XF4OwxjrHrsg1xpgwEhItfWOMMf6x0DfGmDBioW+MMWHEQt8YY8KIhb4xxoQRC31jjAkjFvrGGBNGLPSNMSaM/D+6R7bEl3QQyQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "%matplotlib notebook\n",
    "%matplotlib inline\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "\n",
    "# Data\n",
    "df = pd.DataFrame({'epochs': range(0,len(train_f)), \n",
    "                  'train_f': train_f, \n",
    "                   'dev_f': dev_f})\n",
    " \n",
    "# multiple line plot\n",
    "plt.plot('epochs', 'train_f', data=df, color='blue', linewidth=2)\n",
    "plt.plot('epochs', 'dev_f', data=df, color='green', linewidth=2)\n",
    "plt.legend()\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we test our model on the test data, we have to run its `eval()` method. This will put the model in eval mode, and deactivate dropout layers and other functionality that is only useful in training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "BiLSTMTagger(\n",
       "  (embeddings): Embedding(20002, 300)\n",
       "  (lstm): LSTM(300, 256, bidirectional=True)\n",
       "  (dropout_layer): Dropout(p=0.5)\n",
       "  (hidden2tag): Linear(in_features=512, out_features=11, bias=True)\n",
       ")"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tagger = torch.load(OUTPUT_PATH)\n",
    "tagger.eval()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we test the model. You'll notice its performance is significantly lower than that of the CRF we explored in an earlier notebook. Designing a competitive neural network takes considerably more effort than we put in here: you'll need to make the architecture of the network more complex, optimize its hyperparameters, and often also throw considerably more data at your model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "             precision    recall  f1-score   support\n",
      "\n",
      "      B-LOC       0.83      0.67      0.75       774\n",
      "      I-LOC       0.42      0.45      0.44        49\n",
      "     B-MISC       0.83      0.48      0.60      1187\n",
      "     I-MISC       0.58      0.25      0.35       410\n",
      "      B-ORG       0.72      0.56      0.63       882\n",
      "      I-ORG       0.74      0.57      0.64       551\n",
      "      B-PER       0.82      0.68      0.74      1098\n",
      "      I-PER       0.95      0.71      0.81       807\n",
      "\n",
      "avg / total       0.80      0.58      0.67      5758\n",
      "\n"
     ]
    }
   ],
   "source": [
    "labels = label_field.vocab.itos[3:]\n",
    "labels = sorted(labels, key=lambda x: x.split(\"-\")[-1])\n",
    "label_idxs = [label_field.vocab.stoi[l] for l in labels]\n",
    "\n",
    "test(tagger, test_iter, BATCH_SIZE, labels = label_idxs, target_names = labels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Conclusion\n",
    "\n",
    "In this notebook we've trained a simple bidirectional LSTM for named entity recognition. Far from achieving state-of-the-art performance, our aim was to understand how neural networks can be implemented and trained in PyTorch. To improve our performance, one of the things that is typically done is to add an additional CRF layer to the neural network. This layer helps us optimize the complete label sequence, and not the labels individually. We leave that for future work. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
